skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Shao, Helen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The sum of neutrino masses can be measured cosmologically, as the sub-eV particles behave as “hot” dark matter whose main effect is to suppress the clustering of matter compared to a universe with the same amount of purely cold dark matter. Current astronomical data provide an upper limit on m ν between 0.07–0.12 eV at 95% confidence, depending on the choice of data. This bound assumes that the cosmological model is Λ Cold Dark Matter ( Λ CDM ), where dark energy is a cosmological constant, the spatial geometry is flat, and the primordial fluctuations follow a pure power law. Here, we update studies on how the mass limit degrades if we relax these assumptions. To existing data from the satellite we add new gravitational lensing data from the Atacama Cosmology Telescope, the new Type Ia supernova sample from the Pantheon + survey , and baryonic acoustic oscillation (BAO) measurements from the Sloan Digital Sky Survey and the Dark Energy Spectroscopic Instrument. Using our fiducial data combination, described in the appendix, we find the neutrino mass limit is stable to most model extensions, with such extensions degrading the limit by less than 10%. We find a broadest bound of m ν < 0.19 eV at 95% confidence for a model with dynamical dark energy, although this scenario is not statistically preferred over the simpler Λ CDM model. 
    more » « less
    Free, publicly-accessible full text available April 1, 2026
  2. Abstract We discover analytic equations that can infer the value of Ωmfrom the positions and velocity moduli of halo and galaxy catalogs. The equations are derived by combining a tailored graph neural network (GNN) architecture with symbolic regression. We first train the GNN on dark matter halos from GadgetN-body simulations to perform field-level likelihood-free inference, and show that our model can infer Ωmwith ∼6% accuracy from halo catalogs of thousands ofN-body simulations run with six different codes: Abacus, CUBEP3M, Gadget, Enzo, PKDGrav3, and Ramses. By applying symbolic regression to the different parts comprising the GNN, we derive equations that can predict Ωmfrom halo catalogs of simulations run with all of the above codes with accuracies similar to those of the GNN. We show that, by tuning a single free parameter, our equations can also infer the value of Ωmfrom galaxy catalogs of thousands of state-of-the-art hydrodynamic simulations of the CAMELS project, each with a different astrophysics model, run with five distinct codes that employ different subgrid physics: IllustrisTNG, SIMBA, Astrid, Magneticum, SWIFT-EAGLE. Furthermore, the equations also perform well when tested on galaxy catalogs from simulations covering a vast region in parameter space that samples variations in 5 cosmological and 23 astrophysical parameters. We speculate that the equations may reflect the existence of a fundamental physics relation between the phase-space distribution of generic tracers and Ωm, one that is not affected by galaxy formation physics down to scales as small as 10h−1kpc. 
    more » « less
  3. Abstract We train graph neural networks to perform field-level likelihood-free inference using galaxy catalogs from state-of-the-art hydrodynamic simulations of the CAMELS project. Our models are rotational, translational, and permutation invariant and do not impose any cut on scale. From galaxy catalogs that only contain 3D positions and radial velocities of ∼1000 galaxies in tiny ( 25 h − 1 Mpc ) 3 volumes our models can infer the value of Ω m with approximately 12% precision. More importantly, by testing the models on galaxy catalogs from thousands of hydrodynamic simulations, each having a different efficiency of supernova and active galactic nucleus feedback, run with five different codes and subgrid models—IllustrisTNG, SIMBA, Astrid, Magneticum, SWIFT-EAGLE—we find that our models are robust to changes in astrophysics, subgrid physics, and subhalo/galaxy finder. Furthermore, we test our models on 1024 simulations that cover a vast region in parameter space—variations in five cosmological and 23 astrophysical parameters—finding that the model extrapolates really well. Our results indicate that the key to building a robust model is the use of both galaxy positions and velocities, suggesting that the network has likely learned an underlying physical relation that does not depend on galaxy formation and is valid on scales larger than ∼10 h −1 kpc. 
    more » « less
  4. Abstract We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies. CAMELS-ASTRID employs the galaxy formation model following the ASTRID simulation and contains 2124 hydrodynamic simulation runs that vary three cosmological parameters (Ωm8, Ωb) and four parameters controlling stellar and active galactic nucleus (AGN) feedback. Compared to the existing TNG and SIMBA simulation suites in CAMELS, the fiducial model of ASTRID features the mildest AGN feedback and predicts the least baryonic effect on the matter power spectrum. The training set of ASTRID covers a broader variation in the galaxy populations and the baryonic impact on the matter power spectrum compared to its TNG and SIMBA counterparts, which can make machine-learning models trained on the ASTRID suite exhibit better extrapolation performance when tested on other hydrodynamic simulation sets. We also introduce extension simulation sets in CAMELS that widely explore 28 parameters in the TNG and SIMBA models, demonstrating the enormity of the overall galaxy formation model parameter space and the complex nonlinear interplay between cosmology and astrophysical processes. With the new simulation suites, we show that building robust machine-learning models favors training and testing on the largest possible diversity of galaxy formation models. We also demonstrate that it is possible to train accurate neural networks to infer cosmological parameters using the high-dimensional TNG-SB28 simulation set. 
    more » « less
  5. Abstract We train graph neural networks on halo catalogs from Gadget N -body simulations to perform field-level likelihood-free inference of cosmological parameters. The catalogs contain ≲5000 halos with masses ≳10 10 h −1 M ⊙ in a periodic volume of ( 25 h − 1 Mpc ) 3 ; every halo in the catalog is characterized by several properties such as position, mass, velocity, concentration, and maximum circular velocity. Our models, built to be permutationally, translationally, and rotationally invariant, do not impose a minimum scale on which to extract information and are able to infer the values of Ω m and σ 8 with a mean relative error of ∼6%, when using positions plus velocities and positions plus masses, respectively. More importantly, we find that our models are very robust: they can infer the value of Ω m and σ 8 when tested using halo catalogs from thousands of N -body simulations run with five different N -body codes: Abacus, CUBEP 3 M, Enzo, PKDGrav3, and Ramses. Surprisingly, the model trained to infer Ω m also works when tested on thousands of state-of-the-art CAMELS hydrodynamic simulations run with four different codes and subgrid physics implementations. Using halo properties such as concentration and maximum circular velocity allow our models to extract more information, at the expense of breaking the robustness of the models. This may happen because the different N -body codes are not converged on the relevant scales corresponding to these parameters. 
    more » « less
  6. Abstract We use a generic formalism designed to search for relations in high-dimensional spaces to determine if the total mass of a subhalo can be predicted from other internal properties such as velocity dispersion, radius, or star formation rate. We train neural networks using data from the Cosmology and Astrophysics with MachinE Learning Simulations project and show that the model can predict the total mass of a subhalo with high accuracy: more than 99% of the subhalos have a predicted mass within 0.2 dex of their true value. The networks exhibit surprising extrapolation properties, being able to accurately predict the total mass of any type of subhalo containing any kind of galaxy at any redshift from simulations with different cosmologies, astrophysics models, subgrid physics, volumes, and resolutions, indicating that the network may have found a universal relation. We then use different methods to find equations that approximate the relation found by the networks and derive new analytic expressions that predict the total mass of a subhalo from its radius, velocity dispersion, and maximum circular velocity. We show that in some regimes, the analytic expressions are more accurate than the neural networks. The relation found by the neural network and approximated by the analytic equation bear similarities to the virial theorem. 
    more » « less
  7. Abstract We present the Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) Multifield Data set (CMD), a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from more than 2000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span ∼100 million light-years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N -body simulations from the CAMELS project. Designed to train machine-learning models, CMD is the largest data set of its kind containing more than 70 TB of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io . 
    more » « less
  8. Abstract The Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4233 cosmological simulations, 2049 N -body simulations, and 2184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper, we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogs, power spectra, bispectra, Ly α spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over 1000 catalogs that contain billions of galaxies from CAMELS-SAM: a large collection of N -body simulations that have been combined with the Santa Cruz semianalytic model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies, and summary statistics. We provide further technical details on how to access, download, read, and process the data at https://camels.readthedocs.io . 
    more » « less